A Partition-Based Suffix Tree Construction and Its Applications
نویسندگان
چکیده
A suffix tree (also called suffix trie, PAT tree or, position tree) is a powerful data structure that presents the suffixes of a given string in a way that allows a fast implementation of important string operations. The idea behind suffix trees is to assign to each symbol of a string an index corresponding to its position in the string. The first symbol in the string will have the index 1, the last symbol in the string will have the index n, where n = number of symbols in the string. These indexes instead of actual objects are used for the suffix tree construction. Suffix trees provide efficient access to all substrings of a string. They are used in string processing (such as string search, the longest repeated substring, the longest common substring, the longest palindrome, etc), text processing (such as editing, free-text search, etc), data compression, data clustering in search machines, etc. Suffix trees are important and popular data structures for processing long DNA sequences. Suffix trees are often used for efficient solving a variety computational biology and/or bioinformatics problems (such as searching for patterns in DNA or protein sequences, exact and approximate sequence matching, repeat finding, anchor finding in genome alignment, etc). A suffix tree displays the internal structure of a string in a deeper way. It can be constructed and represented in time and space proportional to the length of a sequence. A suffix tree requires affordable amount of memory. It can be fitted completely in the main memory of the present desktop computers. The linear construction time and space and the short search time are good features of suffix trees. They increase the importance of suffix trees. A suffix tree construction process is space demanding and may be a fatal in the case of a suffix tree to handle a huge number of long DNA sequences. Increasing the number of sequences to be handled, due to random access, causes degrades of the suffix tree construction process performance that uses suffix links. Thus, some approaches completely abandon the use of suffix link and give up the theoretically superior linear construction time for a quadratic time algorithm with better locality of reference.
منابع مشابه
A Dynamic Approach to Weighted Suffix Tree Construction Algorithm
In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based...
متن کاملERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree ...
متن کاملA New Parallel Partition Algorithm for Parallel Suffix Tree Construction
The suffix tree is a compacted trie of all suffixes of a given string. It is a fundamental data structure in a wide range of domains such as text processing, data compression, computer vision, computational biology, and so on [1]. Moreover, it can be used for network researches such as web analysis, which has been studied actively [2], [3]. For example, suffix trees have been utilized to effect...
متن کاملSuffix Trees and Suffix Arrays
Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملDynamic extended suffix arrays
The suffix tree data structure has been intensively described, studied and used in the eighties and nineties, its linear-time construction counterbalancing his spaceconsuming requirements. An equivalent data structure, the suffix array, has been described by Manber and Myers in 1990. This space-economical structure has been neglected during more than a decade, its construction being too slow. S...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012